6.86 < true mean < 9.76
This 95% Confidence Interval indicates that we are 95% certain that the true value of the mean population falls within the calculated range between 6.9 and 7.8.
Genes <- read.csv("GeneRegulation.csv")
GenesCI <- Genes%>%
#find the average/mean of the population
summarize( avg_gene = mean(ngenes),
#find the standard deviation of the mean
sd_gene = sd(ngenes),
#calculate the standard error
std_err = sd_gene/(sqrt(109)),
#find the 95% Confidence Interval
CI_lower = avg_gene - (2*std_err),
CI_upper = avg_gene + (2*std_err))
GenesCI
## avg_gene sd_gene std_err CI_lower CI_upper
## 1 8.311927 7.551916 0.7233423 6.865242 9.758611
This is not the correct interpretation of the confidence interval because it is not a measure of probability.
Mean: 70.1
Standard Deviation: 48.5
Standard Error: 15.3
95% Confidence Interval: Lower: 39.4 < true mean < 100.8
d. With a larger sample size the mean would remain within the same range as this smaller sample. e. Assuming a normal districution: increasing sample size decreases standard deviation - within the equation for standard deviation we are dividing a calculated number by the sample size. Thus: dividing by a larger number will produce a smaller product but under certain circumstances there is a threshold where the sample mean (x) will approach the population mean and the two will even eachother out. f. Increasing sample size increases precision in our sampling distribution. I would expect standard error to decrease with increasing sample size.
Flowers <- read.csv("Corpseflowers.csv")
Flower_Stats <- Flowers %>%
summarize(avg_flowers = mean(numberOfBeetles),
sd_flowers = sd(numberOfBeetles),
SE_flowers = sd_flowers/(sqrt(10)),
Upper_Quartile = avg_flowers + (2*SE_flowers),
Lower_Quartile = avg_flowers - (2*SE_flowers))
Flower_Stats
## avg_flowers sd_flowers SE_flowers Upper_Quartile Lower_Quartile
## 1 70.1 48.50074 15.33728 100.7746 39.42544
Not all months are in the correct order. This may be because it looks like the data was collected during different years.
Sea_Ice <- read_csv("NH_Sea_Ice.csv")
## Parsed with column specification:
## cols(
## Year = col_integer(),
## Month = col_integer(),
## Day = col_integer(),
## Extent = col_double(),
## Missing = col_integer(),
## Month_Name = col_character()
## )
Sea_Ice_Data <-Sea_Ice %>%
mutate(Month_Name = factor(Month_Name)) %>%
mutate(Month_Name = forcats::fct_inorder(Month_Name))
levels(Sea_Ice_Data$Month_Name)
## [1] "Nov" "Dec" "Feb" "Mar" "Jun" "Jul" "Sep" "Oct" "Jan" "Apr" "May"
## [12] "Aug"
?levels
fct_rev()reverses the order of factor levels, so on this dataset this function outputs the factor levels from the above question but in reverse order.
fct_relevel()allows us to move levels around. With my code I moved August, February, and January to see that whatever levels are listed will appear in the order that is represented in my code.
fct_recode()allows the user to change levels by hand. For my code I replaced “Feb” with “Month”
levels(fct_rev(Sea_Ice_Data$Month_Name))
## [1] "Aug" "May" "Apr" "Jan" "Oct" "Sep" "Jul" "Jun" "Mar" "Feb" "Dec"
## [12] "Nov"
?fct_rev
levels(fct_relevel(Sea_Ice_Data$Month_Name, "Aug", "Feb", "Jan"))
## [1] "Aug" "Feb" "Jan" "Nov" "Dec" "Mar" "Jun" "Jul" "Sep" "Oct" "Apr"
## [12] "May"
?fct_relevel
levels(fct_recode(Sea_Ice_Data$Month_Name, Month = "Feb"))
## [1] "Nov" "Dec" "Month" "Mar" "Jun" "Jul" "Sep" "Oct"
## [9] "Jan" "Apr" "May" "Aug"
?fct_recode
Mutate month name to get months in the right order, from January to December. Show that it worked with levels()
Sea_Ice_Data <- Sea_Ice_Data %>%
mutate(Month_Name = fct_relevel(Sea_Ice_Data$Month_Name,
"Jan", "Feb", "Mar", "Apr", "May",
"Jun", "Jul", "Aug", "Sep", "Oct",
"Nov", "Dec"))
levels(Sea_Ice_Data$Month_Name)
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
## [12] "Dec"
#Make a column called Season that is a copy of Month_Name.
Sea_Ice_Data <- Sea_Ice_Data %>%
mutate(Season = fct_recode(Month_Name, Winter = "Jan", Winter =
"Feb", Spring = "Mar", Spring =
"Apr", Spring = "May", Summer =
"Jun", Summer = "Jul", Summer =
"Aug", Fall = "Sep", Fall =
"Oct", Fall = "Nov", Winter =
"Dec"))
#Give the factors levels for Season.
#Show the levels
levels(Sea_Ice_Data$Season)
## [1] "Winter" "Spring" "Summer" "Fall"
Box_Plot <- ggplot(data = Sea_Ice_Data,
mapping = aes(x = Month_Name, y = Extent))+
geom_boxplot()
Box_Plot
#Find the annual minimum by targetting the Year column and summarizing the minimum of Extent grouped by year
Ice_Stats <- Sea_Ice_Data %>%
group_by(Year) %>%
summarize(minimum = min(Extent))
#Make the plot
ggplot(Ice_Stats,
aes(x = Year, y = minimum)) +
geom_point()+
stat_smooth(method = lm)
Using n = 4 just split up months by 4 but not in order of Season - so I facet-wrapped the Season column instead.
graph_by_year <- ggplot(data = Sea_Ice_Data,
mapping = aes(x=Year, y=Extent,
color = Month, group = Month)) +
#add the points
geom_line()
#Now facet wrap to add Seasons
graph_by_year + facet_wrap(~Season)
graph_by_month <- ggplot(data = Sea_Ice_Data,
mapping = aes(x = Month_Name, y = Extent,
group = Year, color = Year)) +
geom_line()+
scale_color_gradientn(colors = wes_palette("Zissou1"))+
guides(colors = "none") +
theme_dark()+
labs(x = "Month",
y = "Sea Ice Extent",
title = "Arctic Sea Ice 1978-2016")
Arctic_Ice <- graph_by_month + theme(panel.grid.major = element_blank())
Arctic_Ice
Arctic_Ice +
transition_time(Year)
Arctic_Ice +
transition_reveal(Year, along = Year)